由于在现实世界应用中广泛使用复杂的机器学习模型,解释模型预测变得至关重要。但是,这些模型通常是黑盒深神经网络,通过具有已知忠实限制的方法来解释事后。广义添加剂模型(GAM)是一种可解释的模型类别,通过分别学习每个功能的非线性形状函数来解决此限制,然后在顶部进行线性模型。但是,这些模型通常很难训练,需要许多参数,并且难以扩展。我们提出了一个全新的游戏亚家族,以利用形状函数的基础分解。在所有功能之间共享少数基础函数,并共同用于给定任务,因此使我们的模型比例更好地到具有高维功能的大规模数据,尤其是当功能稀疏时。我们提出了一种表示是神经基依据(NBM)的体系结构,该模型使用单个神经网络来学习这些基础。在各种表格和图像数据集上,我们证明,对于可解释的机器学习,NBMS是准确性,模型大小和吞吐量的最先进,并且可以轻松模拟所有高阶特征交互。源代码可在https://github.com/facebookresearch/nbm-pam上获得。
translated by 谷歌翻译
广义添加剂模型(GAM)迅速成为完全解释的机器学习的主要选择。但是,与不可解释的方法(例如DNNS)不同,它们缺乏表达能力和易于可扩展性,因此对于实际任务而言并不是可行的替代方法。我们提出了一个新的游戏类,该类别使用多项式的张量秩分解来学习功能强大的,{\ em完全解释}模型。我们的方法标题为“可扩展多项式添加剂模型(垃圾邮件”)是毫不舒服的可扩展性,并且模型{\ em all}的高阶特征交互没有组合参数爆炸。垃圾邮件的表现优于所有当前可解释的方法,并在一系列现实世界的基准测试中匹配DNN/XGBoost性能,并具有多达数十万个功能。我们通过人类主题评估证明,垃圾邮件在实践中明显更容易解释,因此是DNN毫不费力的替代者,用于创建适合大规模机器学习的可解释和高性能系统。源代码可在https://github.com/facebookresearch/nbm-pam上获得。
translated by 谷歌翻译
视觉反事实解释用来自干扰器图像的区域代替了查询图像中的图像区域,以使系统对转换图像的决策变为干扰器类。在这项工作中,我们提出了一个新颖的框架,用于根据两个关键思想计算视觉反事实说明。首先,我们强制执行替换和替换区域包含相同的语义部分,从而产生了更加一致的解释。其次,我们以计算上有效的方式使用多个干扰器图像,并获得更少的区域替代方法的更多歧视性解释。我们的方法在语义上一致性高27%,并且比三个细粒图像识别数据集的竞争方法要快27%。我们通过机器教学实验来强调反事实对现有作品的实用性,在这些实验中,我们教人类对不同的鸟类进行分类。我们还用零件和属性的词汇来补充我们的解释,这些零件和属性对系统的决定有所帮助。在此任务中,当使用相对于现有作品的反事实解释时,我们将获得最新的结果,从而增强了语义一致的解释的重要性。源代码可从https://github.com/facebookresearch/visual-counterfactuals获得。
translated by 谷歌翻译
Efficient and robust control using spiking neural networks (SNNs) is still an open problem. Whilst behaviour of biological agents is produced through sparse and irregular spiking patterns, which provide both robust and efficient control, the activity patterns in most artificial spiking neural networks used for control are dense and regular -- resulting in potentially less efficient codes. Additionally, for most existing control solutions network training or optimization is necessary, even for fully identified systems, complicating their implementation in on-chip low-power solutions. The neuroscience theory of Spike Coding Networks (SCNs) offers a fully analytical solution for implementing dynamical systems in recurrent spiking neural networks -- while maintaining irregular, sparse, and robust spiking activity -- but it's not clear how to directly apply it to control problems. Here, we extend SCN theory by incorporating closed-form optimal estimation and control. The resulting networks work as a spiking equivalent of a linear-quadratic-Gaussian controller. We demonstrate robust spiking control of simulated spring-mass-damper and cart-pole systems, in the face of several perturbations, including input- and system-noise, system disturbances, and neural silencing. As our approach does not need learning or optimization, it offers opportunities for deploying fast and efficient task-specific on-chip spiking controllers with biologically realistic activity.
translated by 谷歌翻译
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a non-expert may be indirect: `let's make the green one'. Reference resolution has been little studied with natural expressions, thus robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of entity pairs and utterances, and develop models for the disambiguation problem. Consisting of 42K indirect referring expressions across three domains, it enables for the first time the study of how large language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We bring aboard a diverse set of researchers from social, media, and computer sciences to overcome barriers and limited framing of this interdisciplinary problem. We collect over $10,000$ unique articles from almost $60$ Czech online news sources. These are categorized into one of the $4$ classes across the credibility spectrum we propose, raging from entirely trustworthy articles all the way to the manipulative ones. We produce detailed statistics and study trends emerging throughout the set. Lastly, we fine-tune multiple popular sequence-to-sequence language models using our dataset on the trustworthiness classification task and report the best testing F-1 score of $0.52$. We open-source the dataset, annotation methodology, and annotators' instructions in full length at https://verifee.ai/research to enable easy build-up work. We believe similar methods can help prevent disinformation and educate in the realm of media literacy.
translated by 谷歌翻译
In this paper, we present a modified Xception architecture, the NEXcepTion network. Our network has significantly better performance than the original Xception, achieving top-1 accuracy of 81.5% on the ImageNet validation dataset (an improvement of 2.5%) as well as a 28% higher throughput. Another variant of our model, NEXcepTion-TP, reaches 81.8% top-1 accuracy, similar to ConvNeXt (82.1%), while having a 27% higher throughput. Our model is the result of applying improved training procedures and new design decisions combined with an application of Neural Architecture Search (NAS) on a smaller dataset. These findings call for revisiting older architectures and reassessing their potential when combined with the latest enhancements.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements, which can be traced to characteristics of slang as a quickly evolving and highly subjective language.
translated by 谷歌翻译